Pinterest open source: early shuffle deletion#3564
Pinterest open source: early shuffle deletion#3564YaoRazor wants to merge 8 commits intoapache:branch-0.4from
Conversation
|
|
||
| trait RunningStageManager { | ||
| def isRunningStage(stageId: Int): Boolean | ||
| } |
There was a problem hiding this comment.
There was a problem hiding this comment.
this is a version based on the original implementation of that PR, I will clean it up to merge to master branch
|
Is there any design doc for this feature? This feature is what I needed. |
|
@FMX I will post the design description/doc while trying to merge this feature to master branch (as it was developed based on 0.4 internally)... on the other side, I am discussing with @YaoRazor , it seems we missed certain important pieces of code when upstreaming, so please do not bother look at the current implementation, it missed many things |
73dc785 to
c0c4019
Compare
|
I have started merging code to main branch at #3569, should be ready before the new year |
|
Hi, all #3569 is ready to review , appreciate the feedbacks in advance |
mridulm
left a comment
There was a problem hiding this comment.
I did not go over the entire PR.
From a quick look, this appears to be an unsound change. Shuffle gets used not just in the currently active stages/jobs - but also 'future' jobs. AQE based sql query execution, RDD reuse, etc all leverage this - I want to make sure I am not missing something here.
|
|
||
| import org.apache.spark.scheduler.{EventLoggingListener, SparkListenerInterface} | ||
|
|
||
| object CelebornSparkContextHelper { |
There was a problem hiding this comment.
Cleanup methods from this object which are not required ?
It looks like the only thing needed is eventLogger right now.
|
|
||
| import org.apache.spark.SparkContext | ||
|
|
||
| trait RunningStageManager { |
There was a problem hiding this comment.
Where is this being used ?
| private def dagScheduler = SparkContext.getActive.get.dagScheduler | ||
|
|
||
| override def isRunningStage(stageId: Int): Boolean = { | ||
| dagScheduler.runningStages.map(_.id).contains(stageId) |
There was a problem hiding this comment.
This is unsafe - runningStages is expected to be used only from within DAGScheduler
|
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
This issue was closed because it has been staled for 10 days with no activity. |

What changes were proposed in this pull request?
This PR represents Pinterest work to aggressively delete shuffle data
Ack
The main work is done by @CodingCat while he is working here, I am currently rolling out the feature and see good results
Why are the changes needed?
Does this PR resolve a correctness bug?
Does this PR introduce any user-facing change?
How was this patch tested?